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Abstract: When we look at images, certain salient structures often attract our im- 
mediate attention without requiring a systematic scan of the entire image. In subsequent 
stages, processing resources can be allocated preferentially to these salient structures. In 
many cases this saliency is a property of the structure as a whole, i.e., parts of the structure 
are not salient in isolation. In this paper we present a saliency measure based on curvature 
and curvature variation. The structures this measure emphasizes are also salient in human 
perception, and they often correspond to objects of interest in the image. We present a 
method for computing the saliency by a simple iterative scheme, using a uniform network of 
locally connected processing elements. The network uses an optimization approach to pro- 
duce a "saliency map," which is a representation of the image emphasizing salient locations. 
The main properties of the network are: (i) the computations are simple and local, (ii) glob- 
ally salient structures emerge with a small number of iterations, and (iii) as a by-product 
of the computations, contours are smoothed and gaps are filled in. 
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§1 Introduction 

Salient structures can often be perceived in an image at a glance. They appear 
to attract our attention without the need to scan the entire image in a systematic 
manner, and without prior expectations regarding their shape. The processes involved 
in the perception of salient structures appear to play a useful role in segmentation and 
recognition, since they allow us to immediately concentrate on objects of interest in the 
image. 

Consider the images in figures 1, 2 and 3. Certain objects in each image somehow 
attract our attention in a manner often described as 'preattentive'. For instance, the 
large blobs in Fig. la and 16 are prominent, although locally the blobs' contours are 
indistinguishable from background contours on the basis of local orientation, curvature, 
contrast, etc. It seems as if one must somehow capture most of the curve bounding a blob 
in order to perceive it as a prominent structure. The circle in Fig. 2 is immediately 
perceived although its contour is fragmented, implying that gaps do not hinder the 
immediate perception of such objects. In this case one must group together several 
line segments of the circle to distinguish it from the background. These examples also 
demonstrate that these prominent objects need not be recognized in order for them to be 
distinguished. The image in Fig. 3 is an edge image of a car in a cluttered background. 
Our attention is drawn immediately to the region of interest in the image. It seems that 
the car need not be recognized to attract our attention. When the image is inverted 
and presented for short periods, recognition becomes considerably more difficult, yet 
the same region remains salient. 

The goal of this paper is to suggest what makes structures such as those in Fig. 
1 — 3 salient, and to propose a mechanism for detecting salient locations in an image. A 
locally connected network is proposed that can process images such as the figures above 
to construct a "saliency map" , which is a representation of the image emphasizing salient 
locations. The computations of the net are devised to meet the following requirements: 
(i) the time it takes to detect a prominent structure does not depend on the complexity 
of background curves, (ii) curves may have any number of gaps, and (iii) the number of 
computations are restricted to the order of dozens or, at most, about a hundred steps 
in order to meet the time constraint involved in immediate perception. 

Issues related to this problem include segmentation, perceptual organization, and 
figure/ground separation. Segmentation schemes have been investigated extensively in 
the field of computer vision and many algorithms have been suggested. They will not 
be reviewed here, since they are only marginally related to the problem at hand. Many 
of the segmentation processes that have been proposed were more ambitious than what 
is required, or what is possible, to achieve in the early stages where prominent areas 
are located. For example, they attempt to segment the entire image instead of just an 
area of interest. Our proposal is related to the suggestion made by Ullman (1986) that 
segmentation should be conducted on an area of interest rather than applied to the 





Figure la. Three prominent blobs are perceived immediately and with little effort. Locally, the 
blobs are similar to the background contours, (adopted from Mahoney (1986) 

Figure lb. Intersections were added to illustrate that the blobs are not distinguished by virtue 
of their intersections with the background curves. 
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F^wre 2. A circle in a background of 200 randomly placed and oriented segments The circle is 
still perceived immediately although its contour is fragmented. 



Figure 3. An edge image of a car in a cluttered background. Our attention is drawn immediately 
to the region of interest. It seems that the car need not be recognized to attract our 
attention. The car also remains salient when parallel lines and small blobs are removed, 
and when the less textured region surrounding parts of the car is filled in with more 
texture. 



entire image, implying that some preattentive process is required to detect prominent 
locations from which an area of interest is defined, prior to the act of segmentation. 

Lowe's (1985) treatment of perceptual organization is more closely related to the 
problem addressed in this paper. The processes proposed by Lowe detect instances 
of collinearity, co-termination and parallelism among straight lines, and will not be 
effective in cases (e.g. Fig. 1) where these conditions do not play a major role. Most past 
approaches to segmentation also do not meet the requirements set above. In particular, 
they do not meet the time constraint and they depend critically on the complexity of 
the background curves. 



1.1 Local and Global Saliency 



The phenomena related to the perception of salient structures can be roughly di- 
vided into two classes. The first, referred to as local saliency, occurs when an element 
becomes conspicuous by having a simple distinguishing local property such as color, 
contrast, orientation, etc. For example, a red item placed among green ones immedi- 
ately attracts attention by virtue of its unique color (Triesman and Galade 1980; Julesz 
1981). The second case, referred to as structural saliency, occurs when the structure is 
perceived in a more global manner. That is, the local elements of the structure are not 
salient as in the former case but instead the arrangement of the elements is what makes 
the structure unique and salient. 

We focus below on the saliency of curves, based on properties measured along 
them (the curves may be continuous or with any number of gaps). Not all phenomena 
of global immediate perception are necessarily accounted for by measuring properties of 
curves. For instance, one could measure the compactness of a structure, the degree of 
symmetry it contains and other measures that are region-based rather than curve-based. 
Nevertheless, properties of curves are often sufficient in order to separate objects from 
their background. 

The fact that structural saliency requires measures that have a global extent in- 
troduces a severe complexity problem. The number of possible groupings of local line 
segments into curves, where the curves are allowed to have any number of gaps, explodes 
exponentially. The complexity issue becomes acute when considering the fact that a 
salient curve of a given length is not necessarily composed of salient sub-parts. Thus, 
contemporary pyramid techniques (see Rosenfeld 1986 for a review) would not be ap- 
propriate for detecting structural saliency, because they contain an implicit assumption 
that a salient curve is composed of salient sub-parts. 



§2 Measuring Saliency as an Optimization Problem 

Our goal is to construct a saliency map which is a representation of the image 
emphasizing salient locations. We seek to associate, therefore, a measure of saliency 
denoted by the function $(•) to each location in the image. A property that seems to 
play a role in structural saliency is the combination of length and smoothness measured 
at a particular scale. That is, a measure of saliency that would account for the type of 
images above is one that favors long smooth curves, where the smoothness of a curve 
is related to its curvature or its curvature variation. We therefore face the following 
problems: 

(1) Defining an appropriate measure $ that, when applied to a point along a given 
curve, will increase when the curve increases in length and smoothness. 

(2) A selection problem. The measure $(P) depends on the curve passing through 
P. Since the curves we are considering are either continuous or separated by 
any number of gaps, there will usually be many possible curves to consider. Our 
approach to this problem will be to select the curve that maximizes $(P) over 
all curves passing through P. 

We defer the exact formulation of $ until we have examined the manner by which it 
is computed. The reason is that the general method of computing $ (using a simple local 
network) places strong constraints on the possible definition of $. In the next sections 
we describe the mechanism by which $ is computed, and then derive an explicit formula 
for $. 



2.1 The Basic Elements 

We assume that $ is computed by a locally connected network of processing el- 
ements. Our specific model is that at the level of computing saliency the image is 
represented by a network of n x n grid points, where each point represents a specific 
x, y location in the image. At each point P there are k orientation elements coming 
into P from neighboring points, and the same number of orientation elements leaving P 
to nearby points. Each orientation element pi responds to an input image by signalling 
the presence of the corresponding line segment in the image, so that those elements 
that do not have an underlying line segment are associated with an empty area or gap 
in the image. We refer to a connected sequence of orientation elements pi,...,pi+N, 
each element representing a line-segment or a gap, as a curve of length N (note that 
curves may be continuous or with any number of gaps). The optimization problem is 
formulated as maximizing $at over all curves of length N starting from p,: 

{Pi+i, — ,Pi+Tf)€o" (Pi) 

where 8 N (pi) is the set of all possible curves of length N starting from pi. 



A naive approach to this problem would involve an exhaustive enumeration of all 
combinations of pi+i, ...,pi+N which would require an exponential search space of size 
k N for each element in the network. In what follows, we will show that for a certain 
class of measures $ ("extensible" measures), the computation becomes linear in N. We 
will then define a saliency measure $ that measures length and smoothness, and at the 
same time is extensible and can be computed efficiently. 



2.2 Multistage Optimization Approach 

For a certain class of measures $(•)> the computation of $jv can be obtained by 
iterating a simple local computation. To illustrate, let us consider first curves that are 
only three elements long. The problem in this case is: 

m{ \ x ,„ $2(Pi,Pi+l,Pi+2) 

That is, for a given element p,, determine pi+i (one of p,'s k neighbors) and p,+2 ( a 
neighbor of pi+i) such that $2(Pi,Pi+i,Pi+2) will be maximal. A naive approach will 
again require examining the k 2 different curves. Assume, however, that $2 satisfies the 
condition: 

max $ 2 (Pi,Pi+i,Pi+2) = max$i(pi,max*i(pi + i,p, +2 )) 

6 2 (Pi) Pi + l Pi+2 

In this case maximizing $2 can be achieved by repeating the application of $1 over 
shorter curves. The general approach is formulated in a similar manner: 

max$ N (pi,...,Pi +N )= max $i(pi, max $jv-i(Pi+i,...,Pi+Ar)) (2.1) 

6 N (pi) Pi+i€°(Pi) * Ar-1 (i»i+i) 

where 8(pi) stands for 6 1 (pi). In this manner we reduce the search space needed for 
each curve of length N starting from pi to the size of kN instead of k that is needed 
for the naive approach. The principle in (2.1) is related to the principle of optimality 
underlying all multistage decision processes, and in particular it is a special case of 
Dynamic Programming. We refer to the family of functions that obey the principle in 
(2.1) as extensible functions. We next derive an extensible function that prefers long 
curves that have low total curvature. 



2.3 Deriving an extensible Function for Measuring Saliency 

We next derive an expression for the saliency of an element p\ on a curve 7 = 
Pi, ...pi+pj. 7 is a curve of length N, terminating at pi. (For a non-end element, the 
saliency is the sum of the contributions of the two sides.) Note that the saliency measure 
is associated with each element, not with the entire curve. Two elements along the same 
curve may have different saliency measures, depending on their position. 
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The saliency measure at p, developed below has the general form 53y ujijctj, where 
j ranges over all the elements lying on 7. That is, the saliency at pi is a weighted sum 
of the contributions from all the elements lying on the same curve. 

Two factors play a role in the measure of saliency. The first factor is related to the 
length of the curve, and the second factor is related to its shape. The length of a curve 
is determined by the number of elements on the curve that have an actual curve (rather 
than a gap) passing through them. These elements are referred to as active elements, 
whereas the elements that are associated with gaps are referred to as virtual elements. 
To each element pi we associate its local saliency a,. If pi is an active element, then <tj 
is set to be a positive value, which for the present is set to 1, and for a virtual element 
<Tj is set to 0. The measure related to the length of the curve p,-, ...,pi+N is: 

i+N 
j=i 

The measure above is a sum of the local saliency values of the active elements along 
the curve. 53 a j 1S m * ne range of to N + 1 depending on the number of active 
elements, implying that a continuous curve scores higher than a fragmented one of the 
same length. It is also possible to 'penalize' the existence of gaps, especially large 
ones, in order to attenuate the measure given to the curve when it is too fragmented. 
Penalizing the existence of gaps is obtained by associating an attenuation factor pi with 
each element pi . The value of pi determines how quickly the contribution to the saliency 
from neighboring elements along the curve decays with distance. It is reasonable to use 
only to values for pj, depending on whether pi is an active or virtual element. If p, is 
active then pi is set to a value smaller or equal 1 (for the present it is set to 1). If pi 
is virtual, then pi = p < 1. We then define an attenuation function associated with the 
curve pi,...,pj as follows: 

i 
Phi = II P k 

k=i+l 

where pij = 1. The measure in (2.2) is modified by the attenuation factors: 

i+N 

j=i 

The measure in (2.3) is a weighted contribution of the local saliency values Oj along the 
curve, where the weights are inversely related to the number of virtual elements along 
Pi,...,Pj. 

In order to measure the shape of the curve we use a measure that is inversely 
related to the total curvature of the curve. The total curvature of a curve 7 is defined 
as J (jj) ds, where 6(s) is the slope along the curve, and jL at point P is known as 
the local curvature at that point (the inverse of R, the radius of curvature). We would 
like to use the total curvature to obtain a measure that is bounded, and is inversely 



related to the total curvature. The following measure meets these requirements: 

.-J,**)'* (2.4) 

which is confined to values between and 1. A straight line receives the value 1, and a 
meandering curve will approach the limit as its total curvature grows to infinity. To 
obtain a discrete approximation to the measure in (2.4) we denote by a* the orientation 
difference between the fc'th element and its successor, and by As the length of an 
orientation element. The local curvature ^ to the curve tangent to these elements (see 
Fig. 4) is: 

2tan^- 

As 
The arc's length is otkR, and therefore the total curvature square is approximated by: 

20;* tan at 
As 
The discrete approximation to the total curvature measure along pi, ...pj is therefore 
obtained by: 

Ci,j =11 fk,k+l 
k=i 

where 

2a<k tan -J*- 

fk,k+i=e 5*-*- (2.5) 

Cij plays the role of a weight given to each local saliency value crj along the curve. A 
measure that gives a high score to long curves with low total curvature is now denned 
as: 

i+N 

Y^ C hjPi,J a J ( 2 - 6 ) 

j=i 

The measure in (2.6) is a weighted contribution of the local saliency values Oj along the 
curve. Each weight is a product of two factors. The first factor is inversely related to 
the number of virtual elements along pi,...,pj, and the second factor is inversely related 
to the total curvature of the curve. Curves that will receive a high measure on (2.6) 
are long curves that are as straight as possible and have the least number of gaps. The 
measure in (2.6) is also extensible according to the definition in (2.1) This can be shown 
by induction on the length of the curve, and the proof will not be detailed here. 

Other functions for measuring the optimality of curves, using multistage optimiza- 
tion, were suggested by Ballard and Sklansky (1976), Martelli (1976) and Montanari 
(1971). The optimal curve in these cases is one that maximizes the sum of gray levels 
or edge magnitude along the curve, while minimizing the sum of orientation difference. 
In our terminology, the optimization function is: 

i+N i+N 

j=i j=i 
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This measure, however, is insensitive to the distribution of orientation difference along 
the curve and in general does not satisfy the requirement to prefer long and as-straight- 
as-possible curves. 




Figure 4- A discrete approximation to the curvature. R approximates the radius 
of curvature, a is the orientation difference, As is the length of both 
elements. 



§3 The Saliency Network 

In this section we summarize the computation performed by the network and its 
relation to the saliency measure defined above. The orientation elements constitute the 
basic computing elements of the net. Each element pi is associated with a processor 
that can perform some computation based on its state and the state of its k neighboring 
processors. This defines a uniform network containing kn 2 processing units, with local 
communication. In the current implementation k is equal to 16, providing a reasonable 
angular resolution. 



3.1 Computation of Elements in the Network 

With each element p, is associated a state variable denoted by E pi and a set of 
three attributes that includes its local saliency <7i, its orientation 9i and its attenuation 
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factor pi. Each element pi updates its state variable E pi iteratively through a local 
computation. (We use here the notation E Pi to indicate explicitly that the variable 
is associated with the element p — i in the network.) At the end of iteration N, E Pi 
contains the measure of saliency derived in (2.6) which will be maximal over all possible 
curves of length N starting at pi, where these curves are either continuous or with any 
number of gaps. 

E p{ is updated by the following computation: 



Pi 



£%+» = <n + Pi max E^fij (3.1) 

Pj€6(pi) 

where pj is one of k possible neighbors of pi, and fij are the "coupling constants" 
defined in (2.5). To unravel the recurrence formula above, we isolate a specified curve 7 
represented by 0j, ...,#,+ jv where each element along the curve has only a single neigh- 
boring element to communicate with. The following proposition relates the value of the 
state variable of pi with the measure in (2.6). 

Proposition 1: 

i+N 

4?° = E CUM'* 

j=i 

The proof is by induction on the length of the curve and will not be detailed here. 
The proposition above together with the fact that the measure is extensible implies that 
among all possible curves 7^ of length N starting from pj, either continuous or with any 
number of gaps, E pi will be computed along that curve which is maximal with respect 
to the measure in (2.6), namely 

3 

taken over all 7;. It is worth noting that the fact that the measure $ is extensible, does 
not imply that the optimal contour through P simply extends itself as the iterations 
proceed. In fact, the optimal curve at stage N + 1 can be different from the optimal 
curve at stage N 

The state values of elements in the network form a new representation of the image 
which is a 'biased' view of the visual environment, emphasizing interesting or conspicu- 
ous locations. We denote this representation as the saliency map. The term of saliency 
map was used by Koch and Ullman (1986) for representing (using our terms) local 
saliency. 



3.2 Additional Properties of the Network 
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Convergence of the State Values 

The concept of an iterative computation raises the issue of convergence when the 
number of iterations goes to infinity. This issue is important in the context of the 
saliency network because an element pi might be influenced by its own state in a feedback 
loop if it lies on a closed curve. The following proposition considers a closed curve and 
evaluates the state of an element of the curve after an infinite number of iterations. 

Proposition 2: 

Consider pi, ...,pi+N a closed curve where pi = pi+pj+i. The state of p, converges to 
the following value: 



E (kN) 



E 



(N) 
Pi 



p ' fc— oo 1 — Ci t i+N 



The proof is by induction on the length of the curve. The main point to notice is 
that a closed curve (even if it is fragmented) will increase its value when the number of 
iterations exceeds the curve's perimeter. If we consider a continuous circle of radius r, 
for example, then Cj^+jv = e~~ which is always less than 1. In practice, the increase 
is considerably smaller than the limiting value because we perform a restricted number 
of iterations. 



Tracing the Curve Starting From a Given Element 

The computation performed by each element includes a local preference between 
neighboring elements. That is, at each iteration each element pi selects the neighbor pj 
that contributes the most to its state. The information regarding local preference can 
be used to trace a linked curve starting from p, in a recursive manner, namely, pj is the 
second element in the curve, p/s preferred neighbor is the third element, etc. Given 
a conspicuous element as a starting point, we could extract the curve that is optimal 
according to (2.6). Examples of these curves are given in section 4. 



Filling Gaps by the Saliency Network 

The ability to cope with gaps is important for the applicability of the saliency 
network to real images. Edge maps obtained from real images are often corrupted 
by multiple gaps, and what seems as a smooth salient curve often turns out to be 
fragmented after edge detection has been applied. 

A virtual element (that lies in a gap) participates in the computation of (3.1) in a 
similar manner to active elements. Consider for instance a gap starting from pj+\ and 
ending at Pj+k- That is, pj is an active element, but Pj+i, ...,pj+k are virtual elements. 
An element will update its state provided that it has at least one neighbor with a state 
value different from 0. It will take at most k iterations for Pj+k to update its state. 
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The network will fill-in a curve ji that will maximize the value of p'^Cjj+k. That is, 
the preference is for filled-in curves having low total curvature Cjj+k, while minimizing 
their overall length |7j|. The relative weight of the two factors is controlled by setting 
the values of p. In the current implementation p was set to 0.7, which was found 
experimentally to give results that are generally in agreement with our own perception. 
The curves generated in this manner are similar (for orientation difference less than j) 
to several other methods for completing gaps in contours and for modelling subjective 
contours in human perception (Rutkowski 1979; Ullman 1976; Webb and Pervin 1984). 



3.3 Additional Computations Performed by the Network 



Measure of Saliency Based on Low Curvature Variation 

The computation of the network summarized in (3.1) produce a saliency map based 
on the measure in (2.6). This does not rule out the possibility of additional properties 
that mediate structural saliency. For instance, the blobs in Fig. 1 seem to be prominent 
on the basis of low curvature variation rather than low overall curvature. A second 
saliency measure was therefore formulated that prefers long curves with low total cur- 
vature variation. Details of this second measure can be found in (Sha'ashua 1988). As 
a result, the saliency network constructs two saliency maps, one for each property, from 
which salient locations can be detected. 



Smoothing the Measured Curves 

The input to the saliency network is an edge map that determines which of the 
network's elements are active. The edges in the edge map are often noisy, due to sensor 
noise, quantization effects, and various effects of the edge detection process. Reducing 
noise is important because what appears to be a smooth curve to our visual system may 
turn out to be rather serrated at the edge map level. Smoothing can be obtained in 
part by analyzing the same image at different resolutions. It turns out, however, that 
some smoothing is often desired within a given scale of analysis. 

A naive approach would be to extract all curves, replace them by a smooth approx- 
imation and then apply the saliency network to the smoothed curves. However, such 
an approach will encounter the same complexity issue regarding the number of possible 
curves discussed in section 1.1. We handle the problem of smoothing curves as a local 
computation that is performed within the saliency network itself, as an integral part 
of computing the saliency measure. In a nutshell, the coordinates associated with each 
orientation element are modified in an iterative manner, to smooth the curve passing 
through that element. The approach underlying the computation is to associate an en- 
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ergy level to each curve so that the smooth approximation is of minimum energy. The 
energy functional is given by: 

*M=iAE ((.,-.?')* + («-,?>)') + 5/(£)'* 

j=i 1 

where (xj, yj) j — i, ..., i + N are the coordinates of the smooth approximation to the 
curve (x'j^y'j) j = i, ...,i + N. A curve of minimum energy is one that minimizes its 
total curvature variation while being as close as possible to the original curve. The 
parameter A controls the relative weight between the two terms (for a similar energy 
functional see Poggio et al. (1985)). The energy is lowered at each iteration in a process 
that involves only local computations. These local computations are combined with 
those in (3.1), resulting a network which measures saliency of curves while smoothing 
them simultaneously. The details can be found in (Sha'ashua 1988). 



§4 Examples of the Saliency Network at Work 

The main issues illustrated by the examples are (i) the saliency map, and (ii) the 
by-product creation of linked curves, which is a by-product of the saliency computation. 

Prominent locations in the image are represented as elements having a high measure 
of saliency as computed by the network. For illustration purposes the saliency map will 
be displayed as a gray-level image in which an element pi is displayed as a bar of width 
u>i and intensity value tj, given by: 

Ti 



Ti = 



^ 255 



max Pj . E P j 



CO; = 



■4 
255 



In other words, increased saliency measure corresponds to an increase in brightness and 
in width of the element in the display. The most salient element is displayed as a white 
bar of width four, and the least salient element is displayed as a black segment. 

The first example is a synthetic image (not produced by edge detection) shown 
in Fig. 2. It is constructed from a fragmented circle placed among a background of 
randomly placed and oriented elements. The number of background elements is 200 
and the circle consists of 60 elements. The circle is immediately perceived by our visual 
system. The saliency network is applied to this image for ten iterations. Fig. 5 presents 
the saliency map after that period, and Fig. 6 presents the selected curve starting from 
the most salient element. The result is in agreement with the perception of the circle 
by our visual system. The saliency measure of each element of the circle is significantly 
higher than the measure given to the background elements. In this regard, the circle 
virtually 'pops-out' from the saliency map. 

The second point to notice is that a complete object is separated from the back- 
ground although it is initially fragmented. This agrees with the observation that per- 
ception is not severely affected by the presence of gaps. The final point to notice is 
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that although the length of the salient curve is 60 elements, the number of iterations 
required for distinguishing the circle from its background is considerably smaller. This 
happens because although each element of the circle is not salient by itself, groups of 
ten elements already become sufficiently salient. Outside the circle, the probability of 
having a low curvature chain of length ten is low. In fact, the probability remains small 
even when the number of background elements increases considerably. To illustrate, 
we doubled the number of background elements as shown in Fig. 7. We applied again 
ten iterations to produce the saliency map in Fig. 8. Starting from the most salient 
element, the curve extracted by the network is identical with the one in Fig. 6. 

The next example is the image in Fig. 3. Fig. 9 shows the saliency map after 
30 iterations. Only the region surrounding the car is displayed. The saliency measure 
given to most of the elements of the car is significantly higher than that given to the 
background elements. Fig. 10 displays the five most salient curves obtained by tracing 
the most salient elements. Note that the traced curves have been smoothed, and that 
the gaps have been filled in. The results suggest that the saliency computation is useful 
for distinguishing significant structures in the image. 

The final example is the image in Fig. la. The input to the network was obtained 
by edge detection from the original hand-drawn image. We show the results for a part 
of the image containing one of the blobs. Fig. 11 displays the saliency map for low 
curvature variation after 160 iterations, which is twice the number of elements on the 
perimeter of the blob. The elements of the blob become stronger than the background 
elements after 70 iterations, in agreement with the observation that one must capture 
almost the entire blob in order to perceive it as prominent. Interestingly, the results of 
the low curvature map are similar, but about 100 iterations are required for the blob to 
become prominent. Fig. 12 displays the curve starting from the most salient element. 
In this case also the curve is smoothed by the network while measuring its saliency. 



§5 Summary 



5.1 Brief Summary of the Scheme 

A measure of saliency S(P) is defined for the edge elements in the image. The 
saliency measure is used for detecting globally salient structures in the image. As a by- 
product, the process fills-in smoothly gaps in fragmented contours, and provides linking 
information between edge segments. 

Saliency of a Single Curve 

Let 7 be a curve and P an end element of the curve. The saliency of P given 7, 
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S y (P) is defined as: 

S f( p ) = JZ w « a « 

i 

where <Tj is the local saliency of the i'th edge element along 7, and u;,- is the weight of 
the element's contribution. 

The weight o>,- is a product of two factors. The first factor is 

e~ Ci 
where Ci is the total curvature of the curve up to the i'th element. The second factor 
penalizes the existence of gaps, and is defined as: 

fc=0 

where pk is the attenuation factor of the k'th element along the curve. One value is 
used for real edges, another for gaps in the contour. 

The Saliency Measure 

The measure in section 1 depends on a particular curve 7. The saliency at P is 
given by: 

S(P) = max 5 7 (P) 

~t 

A maximum is reached over all possible curves terminating at P. In practice, the 
definition is limited to curves of length N: 

S N (P) = max S yN (P) 

-fN 

Note: the maximum is taken over all possible curves, including fragmented ones. As a 
by-product, curves are being filled-in. 

Operation of the Network 

j-r(n+l) . T7t(n) £ 

E) =<?i + pi max E) fij 
jes(i) 

S(i) are all the neighbors of element i. The quantities pi and fij ("couplings") are 

constants of the network. The initial input at step are the local saliencies «r,-. At the 

(n + 1) iteration, each element simply adds the maximal contribution from its neighbors 

to its own local saliency. After N iterations the computation defined above computes 

the saliency measure Sn(P) for each element P. 



5.2 General Summary 

It is proposed that immediate perception includes processes for detecting salient 
structures in the image on which subsequent processes such as segmentation and recog- 
nition can focus. The saliency of a structure is divided into two sources, local saliency, 
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and structural saliency. Of the two, structural saliency is more problematic from a 
computational point of view since it requires the efficient computation of certain global 
properties. 

A locally connected network was devised to produce a saliency map, which is a 
representation of the image emphasizing salient locations. The network exhibits the 
following properties: (i) the computations are local and simple, (ii) the number of 
computations are in the order of dozens or up to about a hundred, (iii) there is little 
dependence on the complexity of the image, (iv) gaps in curves are filled in the course 
of the computation, (v) contours are smoothed in the course of producing a saliency 
map, (vi) the network produces linking information so that curve tracing across junc- 
tions, branches and gaps is possible, and (vii) the network is robust in the sense that 
malfunction of some processing units does not affect seriously the performance of the 
network. 



Acknowledgement: We thank E. Grimson for his comments. 
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Fi</ure 5. Saliency map of the image in Fig. 2 obtained by the network after 10 iterations. 
The saliency measure of each element of the circle is significantly higher than of the 
background elements. 

Figure 6. The curve starting from the strongest element in figure 5. Virtual elements are 
displayed as dotted lines. 
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.Ffyure 7. The same circle as in figure 2 but with 400 background segments. 

Figure 8. Saliency map of the image in Fig. 7 obtained by the network after 10 iterations. 
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Figure 9. Saliency map of the image in Fig. 3 obtained by the network after 30 iterations. The 
region of interest virtually 'pops-out' from the display. 

Figure 10. The five most salient curves obtained by tracing the most salient elements of figure 
9. The curves have been smoothed and gaps have been filled in. 





Figure 11. Saliency map for low curvature variation of the image in Fig. 1 



Figure 12. The curve starting from the strongest element in figure 11 is traced. The curve is 
smoothed by the network while measuring its saliency. 
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